Software Prepromotion for Non-Uniform Cache Architecture

نویسندگان

  • Junjie Wu
  • Xiaohui Pan
  • Xuejun Yang
چکیده

As a solution to growing global wire delay, nonuniform cache architecture (NUCA) has already been a trend in large cache designs. The access time of NUCA is determined by the distance between the cache bank containing the required data and the processor. Thus, one of the important NUCA researches focuses on how to place data to be used into cache banks close to the processor. This paper proposes software prepromotion technique, which prepromote data using prepromotion instructions as similar as software prefetching does. Besides the basic software prepromotion, this paper also proposes smart multihop software prepromotion (SMSP), very long software prepromotion (VLSP) and their combination technique. SMSP intelligently chooses cache banks which the prepromoted data most ideally suit to being moved into. And VLSP prepromote multiple data using one instruction. Finally, we evaluate our approaches by testing 7 kernel benchmarks on a full-system simulator. The basic software prepromotion gets an average improvement of 2.6893% in IPC. The SMSP improves IPC by 7.0928% averagely. And the VLSP gets an IPC improvement of 7.2194% averagely. Lastly, after combining the SMSP and VLSP, the average improvement in IPC achieves 11.8650%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Argument for Simple COMA

We present design details and some initial performance results of a novel scalable shared memory multiprocessor architecture that incorporates the major strengths of several contemporary multiprocessor architectures while avoiding their most serious weaknesses. Speciically, our architecture design incorporates the automatic data migration and replication features of cache-only memory architectu...

متن کامل

Rapid Hardware Prototyping on Rpm-2: Methodology and Experience

Field-Programmable Gate Arrays is an emerging technology which promises easy hardware reconfigurability by software at low cost. Entire systems can be built in which some parts are programmable. Such systems implement various architectures. Each architecture prototype is a detailed hardware implementation of the architecture -including I/O-on which complex software systems can be ported. We hav...

متن کامل

Shared Memory Multiprocessor Architectures for Software IP Routers

In this paper, we propose new shared memory multiprocessor architectures and evaluate their performance for future Internet Protocol (IP) routers based on Symmetric Multi-Processor (SMP) and Cache Coherent Non-Uniform Memory Access (CC-NUMA) paradigms. We also propose a benchmark application suite, RouterBench, which consists of four categories of applications representing key functions on the ...

متن کامل

Performance Models for Electronic Structure Methods on Modern Computer Architectures

Electronic structure codes are computationally intensive scientific applications used to probe and elucidate chemical processes at an atomic level. Maximizing the performance of these applications on any given hardware platform is vital in order to facilitate larger and more accurate computations. An important part of this endeavor is the development of protocols for measuring performance, and ...

متن کامل

Hypercube Connectivity within ccNUMA Architecture

The Silicon Graphics Origin2000TM and Onyx2TM systems are structured to fit a customer’s specific applications and problem sizes. This is accomplished with the company’s ccNUMA (cache coherent non-uniform memory access) architecture, which can link multitudes of processors together in such a way that the number of interconnections scales with the growth of the system, avoiding the bandwidth lim...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • JSW

دوره 5  شماره 

صفحات  -

تاریخ انتشار 2010